Quality control report

Quality control report overview

  • Cohort overview
  • Processing of transcriptomics
  • Processing of proteomics
  • Multi-omics object

Cohort overview

Table 1

Control
(N=20)
AD
(N=24)
Overall
(N=44)
age
Mean (SD) 79.9 (7.70) 77.5 (8.24) 78.6 (7.99)
Median [Min, Max] 79.5 [65.0, 92.0] 77.5 [64.0, 92.0] 78.5 [64.0, 92.0]
sex
F 11 (55.0%) 12 (50.0%) 23 (52.3%)
M 9 (45.0%) 12 (50.0%) 21 (47.7%)
diagnosis
Control 20 (100%) 0 (0%) 20 (45.5%)
AD 0 (0%) 24 (100%) 24 (54.5%)
Braak
0 5 (25.0%) 0 (0%) 5 (11.4%)
1 3 (15.0%) 0 (0%) 3 (6.8%)
2 12 (60.0%) 0 (0%) 12 (27.3%)
3 0 (0%) 3 (12.5%) 3 (6.8%)
4 0 (0%) 1 (4.2%) 1 (2.3%)
5 0 (0%) 8 (33.3%) 8 (18.2%)
6 0 (0%) 12 (50.0%) 12 (27.3%)
amyloid
Mean (SD) 0.422 (0.333) 3.61 (2.68) 2.70 (2.69)
Median [Min, Max] 0.334 [0.0456, 0.968] 3.13 [0.0577, 8.38] 1.48 [0.0456, 8.38]
Missing 12 (60.0%) 4 (16.7%) 16 (36.4%)

Dataframe

Transcriptomics

Processing parameters

Minimum count: 10
Minimum percentage of sample with transcript: 0.5
Filter for protein coding genes only: TRUE
Transformation method: rlog
Apply batch correction: TRUE Batch variable name: No batch
Remove sample outliers: TRUE
Dependent variable: diagnosis
Dependent variables levels: Control, AD
Covariates: age, sex, PMD

Metadata

Raw data

Mean value per feature

Missing value per sample

Protein coding genes filtering

Keep only protein coding genes in the analysis: TRUE
Number of non protein coding genes identified: 3.9155^{4}
Percentage of genes filtered out: 0.3350486
Number of protein coding genes kept for analysis: 1.9729^{4}

Further Filtering

Minimum count: 10
Minimum percentage of sample with transcript: 0.5
Filter for protein coding genes only: TRUE

Number of genes filtered: 4884
Percentage of genes filtered out: 0.2475544
Number of genes kept for analysis: 1.4845^{4}

Mean values per feature after filtering

Mean values per sample after filtering

Genes filtered out

Transformation

Transformation method: rlog

Batch correction

Apply batch correction: TRUE
Batch variable name: No batch

Outliers

Remove sample outliers: TRUE

Sample outliers

Number of sample outliers: 0
Percentage of sample outliers: 0

No sample outlier detected!

Overview pre/post processing

Dimension of initial count matrix versus filtered matrix

After filtering, 44039 genes were removed from the analysis. The final matrix for analysis consisted of 14845 proteins and 29 samples.

Raw and processed counts

Before processing

After processing

PCA

Before processing

After processing

Correlation to clinical data

Screeplot

Correlation heatmap

Sample to sample distances

## [1] 29 24
## [1] 29 29

Highly expressed features

Heatmap of the the 20 most expressed genes

Violin plot

## Using protein as id variables

Highly variable genes

Heatmap of the 20 most variable genes

Violin plot

## Using protein as id variables

Proteomics

Processing parameters

Apply protein filtering: TRUE
Minimum percentage of sample with non missing protein abundance: 0.5
Imputation method: minimum_value Apply batch correction: TRUE
Batch correction method: median_centering
Batch variable name: No batch
Remove protein outliers: TRUE
Remove sample outliers: TRUE
Denoising: TRUE
Dependent variable: diagnosis
Dependent variables levels: Control, AD
Covariates for denoising: PMD, sex, age

Metadata

Raw data

Missing value per feature

Missing value per sample

Filtering

Apply protein filtering: TRUE
Minimum percentage of sample with non missing protein abundance: 0.5

The parameters chosen for filtering of proteins ensured that at least 50 % of samples had abundance data for a single protein.

Number of protein filtered: 322
Percentage of protein filtered out: 0.0997522
Number of protein kept for analysis: 2906

Remaining missing values per feature after filtering

Remaining missing values per sample after filtering

Proteins filtered out

Imputation

Imputation method: minimum_value

Batch correction

Batch correction method: median_centering
Batch variable name: No batch

Outliers

Remove protein outliers: TRUE
Remove sample outliers: TRUE

Feature outliers

Number of protein outliers: 266
Percentage of feature outliers: 0.0915348

Sample outliers

Number of sample outliers: 0
Percentage of sample outliers: 0

No sample outlier detected!

Overview pre/post processing

Dimension of initial protein abundance matrix versus filtered matrix

After filtering, 588 proteins were removed from the analysis. The final matrix for analysis consisted of 2640 proteins and 39 samples.

Log2 Protein abundance

Before processing

## Using id, batch as id variables

After processing

## Using id, batch as id variables

PCA

Before processing

After processing

Correlation to clinical data

Screeplot

Correlation heatmap

Sample to sample distances

Highly abundant features

Heatmap of the log2 abundance of the 20 most abundant proteins.

Violin plot

## Using protein as id variables

Highly variable features

Heatmap of the log2 abundance of the 20 most variable proteins

Violin plot

## Using protein as id variables

Multimodal object

Overview

ExperimentList class object of length 4: * [1] rna_raw: SummarizedExperiment with 58884 rows and 29 columns * [2] protein_raw: SummarizedExperiment with 3228 rows and 39 columns * [3] protein_processed: data.frame with 2640 rows and 39 columns * [4] rna_processed: data.frame with 14845 rows and 29 columns

Transcriptomics & proteomics slots

Intersection of samples & genes

Correlation of mRNA abundance and protein expression

Dataframe

Distribution

Features with highest/lowest correlation

Top 20 High

Top 20 Low

Top gene

The maximum correlation between mRNA expression and protein abundance is achieved for the RPH3A gene with 0.8859244 correlation.

Transcriptomics/proteomics PCA correlations


Omix v1.0.0 – 2023-06-15 14:32:21

 

A report by Omix